SYSTEM AND METHOD FOR PRE-PROCESSING 



INFORMATION USED BY AN AUTOMATED ATTENDANT 



CROSS-REFERENCED TO RELATED PATENT APPLICATION 

[0001] This patent application is a continuation application of and claims priority 
to U.S Patent Application Serial No. 10/041,620 filed January 10, 2002, which 
claims benefit of U.S. Provisional Patent Application Serial No. 60/ 300,867 filed 
June 27,2001. 

TECHNICAL FIELD 

[0002] The present invention relates to automatic directory assistance. In 
particular, the present invention relates to systems and methods for 
automatically pre-processing entries contained in an informational database 
used by an automated attendant. 

BACKGROUND OF THE INVENTION 

[0003] In recent years, automated attendants have become very popular. Many 
individuals or organizations use automated attendants to automatically provide 
information to callers and/or to route incoming calls. An example of an 
automated attendant is an automated directory assistant that automatically 
provides a telephone number, address, etc. for a business or an individual in 
response to a user's request. 

[0004] Typically, a user places a call and reaches an automated directory 
assistant (e.g. an Interactive Voice Recognition (IVR) system) that prompts the 
user for desired information and searches an informational database (e.g., a 
white pages listings database) for the requested information. The user enters 
the request, for example, a name of a business or individual via a keyboard, 
keypad or spoken inputs. The automated attendant searches for a match in the 
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informational database based on the user's input and may output a voice 
synthesized result if a match can be found. 

[0005] When offering automated directory assistance, the informational 
database may be used for two purposes. One purpose may be to create 
vocabularies and grammars for the speech recognition engine that recognizes 
the caller's request and a search engine that searches for a match. The other 
purpose may be to generate a speech-synthesized output of the requested 
listing to the caller. 

[0006] The information or listings contained in these informational databases 
may contain abbreviations, acronyms, errors, or other deviations that may 
prevent the search engine from recognizing the listing as well as the speech 
synthesizer from pronouncing the listings so that it is understood by the caller. 
For example, the system may not be able to recognize or pronounce the 
abbreviation "CLD HARBR SPRNG" to mean "Cold Harbor Springs." In another 
example, the speech recognition engine may not understand a caller's request 
if the caller uses the abbreviation "N - C - double A" to mean "N - C - A - A." 

[0007] Additionally, directory listings are typically optimized for visual 
presentation, not for conversation. Thus, the word order is often reversed and 
acronyms are used extensively. Such deviations may further prevent the listing 
from being recognized. For example, the listing "Smith Joe S., MD" may not be 
recognized if the caller says "Doctor Joe S. Smith." 

[0008] Such deviations in the listings database and/or in the way caller's may 
pronounce a requested listing may prevent the caller's request for information 
from being completed automatically or may delay its completion. 

[0009] One approach to solving this problem involves having an operator 
personally inspect each database entry individually and fine-tuning each listing. 
This conventional technique can be impractical when hundreds of thousands 
and even millions of listings are not only involved, but may also be in a 
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continual state of flux, as is the case with telephone directory listings. 
Additionally, errors, abbreviations, acronyms, etc. may require intervention of an 
operator, which can delay the process and prevents complete automation, 
which is desirable. 

SUMMARY OF THE INVENTION 

[0010] Embodiments of the present invention concern a method and system for 
pre-processing entries in directory listings. An automated attendant or 
automated directory listings assistant may use the pre-processed entries. A 
first directory listings including one or more fields may be received. The one or 
more fields may be populated with entries including one or more symbol strings. 
A second directory listings including one or more fields may be received. The 
one or more fields of the second directory listings may be populated with entries 
including one or more symbol strings. Entries in the one or more fields of the 
first directory listings may be correlated with entries in the corresponding one or 
more fields of the second directory listings. Entries, in the one or more fields of 
the first directory listings, which do not correlate with entries in the 
corresponding one or more fields of the second directory listings may be 
identified. The identified entries may be processed using a rule set 
corresponding to the field in which the entry is located. Based on the rule set, a 
corresponding confidence level for the processed entries may be determined. 
The processed entries having the corresponding confidence level meeting or 
exceeding a threshold may be automatically modified. The automatically 
modified entries may be outputted for processing. In alternative embodiments 
of the present invention, the processed entries having the corresponding 
confidence level below the threshold may be marked for operator confirmation. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0011] Embodiments of the present invention are illustrated by way of example, 
and not limitation, in the accompanying figures in which like references denote 
similar elements, and in which: 
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[0012] FIG. 1 is a block diagram of a directory listings pre-processing system in 
accordance with an embodiment of the present invention; 

[0013] FIG. 2 illustrates a block diagram of a listings pre-processing device in 
accordance with an embodiment of the present invention; 

[0014] FIG. 3 is block diagram of a graphical user interface in accordance with 
an exemplary embodiment of the present invention; and 

[0015] FIG. 4 is flowchart showing a listings pre-processing method in 
accordance with an exemplary embodiment of the present invention. 

DETAILED DESCRIPTION 

[0016] Embodiments of the present invention relate to an automated and/or 
semi-automated system that can pre-processes directory listings or other 
information so that the information can be automatically recognized and/or 
presented to a user. Embodiments of the present invention may utilize a series 
of pre-processing steps to, for example, correct typographical errors, expand 
abbreviations to be context sensitive, correct order of words, expand acronyms, 
and/or specify how acronyms, proper names (people and places) and/or other 
information should be pronounced. 

[0017] The listings pre-processing system, in accordance with embodiments of 
the present invention, may process listings entries according to a rule set. For 
example, the system may generate a pre-processed listings output and a 
corresponding confidence level for each pre-processed listing. The confidence 
level may be generated based on the rule set to indicate the level of certainty 
with which the listing was corrected or preprocessed. If, for example, a 
processed listing has a corresponding confidence level above or at a 
predetermined threshold, the listing may be sent directly to an automated 
attendant for immediate use in speech recognition and/or speech synthesis. 
Optionally and/or additionally, such high confidence outputs may be sent to a 
storage device for use at a later time and/or to any other device. 
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[0018] Alternatively, in embodiments of the present invention, if a processed 
listing has a corresponding confidence level below a predetermined threshold, 
the processed listing may be sent immediately to, for example, an operator for 
confirmation and/or correction. Optionally and/or additionally, such low 
confidence outputs may be sent to a storage device for use at a later time 
and/or to any other device. 

[0019] Embodiments of the present invention may include a graphical user 
interface (GUI) for presenting, to the operator, the low confidence or 
questionable listings together with, for example, suggested possible corrections 
for selection by the operator. Using the GUI, the operator may modify the 
questionable listings based on one or more rules included in the pre-determined 
rule set or, alternatively, the operator may modify the questionable listing based 
on the operator's personal discretion. In embodiments of the present invention, 
the operator may create additional rules that may be used to pre-process the 
listings. These additional rules, created by the operator, may be included in the 
predetermined rule set to pre-process the listings in accordance with 
embodiments of the present invention. 

[0020] FIG. 1 is a block diagram of a directory listings pre-processing system 
100 according to an exemplary embodiment of the present invention. The 
directory listings pre-processing system 100 may include a listings pre- 
processing device (LPPD) 120 that may operate in accordance with 
embodiments of the present invention. 

[0021] In embodiments of the present invention, the LPPD 120 may receive 
information entries from an informational database 110. For example, the 
informational database 110 may be a white pages listings database that may 
include a plurality of fields including one or more information entries. The 
plurality of fields may include names of individuals and/or businesses, 
corresponding street addresses, township, city, state and/or country names, zip 
codes, telephone numbers, e-mail addresses, web site addresses, and/or any 
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other information relating to the individuals and/or businesses. It is recognized 
that the database 110 may include any type of information that may be used by 
automated attendants to provide a variety of products and/or services to users. 
It is also recognized that embodiments of the present invention may be used to 
pre-process any type of information to correct errors, expand abbreviation, add 
abbreviations, expand acronyms, add acronyms, etc. 

[0022] In embodiments of the present invention, entries in the various 
databases, referred to or described herein, may include one or more symbol 
strings. Symbol strings as used herein may be text or character strings that 
represent individual or business listings and/or other information. 

[0023] Although FIG. 1 shows the informational database 110 as a single 
database, it is recognized that the database 110 may be a plurality of different 
databases where each database may contain specific type of information. For 
example, one type of the informational database 110 may contain only 
individual and/or business names, while another type may contain only 
addresses, while yet another type may contain names and corresponding 
phone numbers and/or corresponding township names, etc. 

[0024] The database 110 may be a typical information repository such as white 
pages listings database used by automated directory assistants to search for 
and provide information to callers. Typically, the database 110 may contain at 
least some entries that may contain errors or other deviations that may prevent 
the entry from being recognized automatically by, for example, a speech 
recognizer and/or pronounced by a speech synthesizer. For example, the 
database 110 may contain entries, in one or more fields, that contain spelling 
errors, typographical errors, acronyms, abbreviations, improper or varying 
pronunciation, improper or varying word order and/or other informalities that 
may prevent entries from being speech recognizer and/or pronounced by a 
speech synthesizer. 
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[0025] In embodiments of the present invention, LPPD 120 may receive and/or 
retrieve informational entries from the database 110 and may pre-process the 
entries based on one or more pre-determined rule sets, in accordance with 
embodiments of the present invention (to described below in more detail). Pre- 
processing the entries of database 110, in accordance with embodiments of the 
present invention, may reduce the delays and/or in-efficiencies that may 
otherwise be encountered by, for example, an automated directory assistant 
when searching for a user's request. 

[0026] In embodiments of the present invention, after the LPPD 120 pre- 
processes the entries from database 110, the pre-processed entries may be 
forwarded to, for example, the automated attendant 190 for storage and/or 
immediate use. 

[0027] In embodiments of the present invention, the pre-processed entries may 
be stored in the pre-processed listings database 132 located in, for example, 
the speech recognition system 130 of automated attendant 190. The grammar 
generator 134 may generate one or more grammars using the pre-processed 
entries stored in pre-processed listings database 132. The grammar generator 
134 may be any type of known hardware and/or software device for generating 
grammars. The generated grammars may be stored in the 
vocabulary/grammars database 136. The automated attendant 190 may utilize 
the grammars generated based on the pre-processed listings to search for the 
user's request for information. 

[0028] In accordance with embodiments of the present invention, the automated 
attendant 190 may further utilize the pre-processed entries received from LPPD 
120 to generate a spoken output for the requested information using speech 
synthesizer 140. The pre-processed entries may be stored in pronunciation 
dictionary 142 and forwarded to the speech synthesis device 144. The speech 
synthesis device 144 may be any type of speech synthesizer known in the art. 
The pronunciation dictionary 142 may include at least one pronunciation of 
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each word of the pre-processed entries received from the LPPD 120. The 
speech synthesis device 144 may generate sound files based on the pre- 
processed listings received from PD 120 and store the generated sound files in 
sound files database 146. The generated sound files from database 146 may 
be output to the user by automated attendant 190 to complete the user's 
request for information. 

[0029] The automated attendant 190 may include other components and/or 
devices that are not shown for simplicity. The automated attendant 190 may 
engage in further dialog with the user to provide additional information, and/or 
to conduct additional searches in the event the user is not satisfied by the 
results provided by the automated attendant 190. Additionally, the automated 
attendant may provide the user with other services such as initiating a call on 
the user's behalf based on the searched information and/or other known 
automated services. 

[0030] FIG. 2 is a block diagram of the LPPD 120 in accordance with an 
embodiment of the present invention. The LPPD 120 may include a pre- 
processor 220, a reference database 270, a rules database 21 1 , a non- 
confirmed listings database 240 and a confirmed pre-processed listings 
database 250. It is recognized that any suitable hardware and/or software may 
be used by one of ordinary skill in the art to configure and/or implement the 
LPPD 120 in accordance with embodiments of the present invention. 

[0031] In embodiments of the present invention, the pre-processor 220 may 
include, for example, a word order normalizer 221 , a street name expander 223, 
and/or a township corrector 225. The pre-processor 220 may include additional 
components such as a spelling checker, abbreviation expander, acronym 
detector, pronunciation generator, grammar checker, and/or corrector, etc. (not 
shown). 
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[0032] In embodiments of the present invention, the plurality of databases (e.g., 
databases 270, 21 1, 240, 250, etc.) shown can be stored in a memory device 
that may be located internal to and/or external to the LPPD 120. 

[0033] In embodiments of the present invention, LPPD 120 may receive, for 
example, a white pages listings from informational database 1 10 for pre- 
processing. The white pages listings from database 110 may contain a plurality 
of fields that contain a plurality of entries. The white pages listings database 
110 may include such fields as individual and/or business names, 
corresponding street addresses, townships, zip codes, etc. It is recognized that 
the white pages listings database 110 may include additional fields containing, 
for example, e-mail addresses, web page addresses, phone numbers, etc. 

[0034] In embodiments of the present invention, the listings pre-processing 
device 120 receives the plurality of entries from, for example, the white pages 
listings database 110 and may pre-process the entries according to one or 
more rules included in the rules database 21 1 . The pre-processed entries may 
be forwarded to, for example, an automated attendant or to an operator. The 
listings may be pre-processed periodically or may be preprocessed as desired 
by, for example, an operator. 

[0035] In embodiments of the present invention, the word order normalizer 221 
may correct the order of names included in the "Names" field of listings 
database 110 based on corresponding rules in the rules database 21 1 . The 
normalizer 221 may recognize that the names field from the plurality of fields 
included in the database 110 using, for example, clues in the corresponding 
entries to identify that the listing corresponds to a person's name. For example, 
the normalizer 221 may look for titles such as doctor, MD, accountant, Esq., 
etc. appearing in the entry to identify that the listing represents an individual's 
name. After the field is recognized, the normalizer 221 may verify and correct, 
if necessary, the order of the names in the corresponding field. 
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[0036] In embodiments of the present invention, the normalizer 221 may 
correlate the first and the last names as appearing in the each entry of the 
listings database 1 10 to corresponding entries in the reference database 270. 
The normalizer 221 may identify entries in the database 110 that correspond to 
a name and title of an individual. The reference database 270 may be a pre- 
verified database that may contain, for example, a list of the top N (e.g., 10000) 
frequent first names, and top N most frequent last names. The normalizer 221 
then may correlate each word in the listing to the reference database 270, and 
determine which is likely to be a given name and which is the family name, and 
change the order of the words accordingly. In alternative embodiments of the 
present invention, the reference database 270 may be, for example, a pre- 
verified database that is used by, for example, a postal service. In this case, 
the reference database 270 may contain names, street names, and full 
addresses, etc. of individuals and/or businesses in a particular community, 
town, city, state, and/or country. It is also recognized that reference database 
270 can be any type of database containing verified entries that can be used to 
verify entries included in any other type of database. 

[0037] In embodiments of the present invention, after the normalizer 221 
identifies entries in the database 110 that do not correlate with corresponding 
entries in the reference entries, the normalizer 221 may process those entries in 
accordance with the corresponding rule in the rules database 211. The order 
normalizer 221 may identify, based on the correlation with the reference 
database 270, entries in the listings database 1 10 that have, for example, 
inverted or otherwise errant entries. 

[0038] For example, during a pre-processing step, normalizer 221 may receive 
an entry such as "Smith, John M.D." specified in the names field. The 
normalizer 221 may confirm that the entry belongs in the names field based on, 
for example, the title "M.D." included in the entry. Based on a rule set for the 
word order normalizer 221 contained in the rule set database 21 1 , the 
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normalizer 221 may compare the entries "Smith" and "John" with entries 
contained in the given and family names fields of the reference database 270. 

[0039] In embodiments of the present invention, the reference database 270 
may be, for example, a list of the top N (e.g., 10000) frequent first names, and 
top N most frequent last names. The normalizer 221 may find a match for the 
entry "Smith" in the frequent family names field, and for "John" in the frequent 
given names field in the reference database 270. The normalizer 221 may 
determine that the name or word order of the entry should be re-arranged to 
read "John Smith." 

[0040] In addition, based on a rule set for the normalizer 221 contained in the 
rule set database 21 1 , the abbreviation "M.D." may be changed or expanded to 
"Doctor." Accordingly, the normalizer 221 may modify the entry "Smith, John 
M.D." to "Doctor John Smith." 

[0041] In embodiments of the present invention, after the entry has been 
modified, the pre-processor 220 may determine, based on the rules used to 
modify the entry from rules database 21 1 , a confidence level for the 
corresponding pre-processed entry. The determined confidence level may be 
compared to a pre-determined threshold that may be set for one or more 
entries. It is recognized separate threshold levels can be set for a particular 
entry or particular types of entries. For example, entries in the "Names" may 
have a one threshold and entries in the "Address" field may have another 
threshold. If a pre-processed entry has a corresponding confidence level above 
the corresponding threshold (also referred to herein as being processed with a 
high level of confidence), the modified entry may be stored in the confirmed pre- 
processed listings database 250 and/or may be forwarded directly to the 
automated attendant 190. 

[0042] In embodiments of the invention, the confidence levels can be 
determined dynamically, based upon the rules and degree of correlation with 
the reference database 270. For example, the entry "John Michael M.D" may 
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be converted to "Doctor Michael John" with low confidence because both "John" 
and "Michael" are listed as frequent given names in the reference database 
270. The entry "Smith John J. MD" may be converted to " Doctor John J. 
Smith" with a high confidence level, since " John" is a likely given name and 
"Smith" is a likely family name according to the reference database 270. 
Additionally, this entry may have a high confidence level based on a rule that, 
for example, says that a middle initial is likely to follow a given name, as 
opposed to family name. 

[0043] In alternative embodiments of the present invention, if a pre-processed 
entry has a corresponding confidence level below the corresponding threshold 
(also referred to herein as being processed with a low level of confidence), the 
modified entry may be forwarded to, for example, the non-confirmed listings 
database 240. The non-confirmed listings database 240 may be accessed by 
an operator using an operator interface 180. The operator may check the entry 
to determine if the entry is correct or may modify the entry in accordance with 
embodiments of the present invention (to be described below in more detail). 

[0044] In embodiments of the present invention, street name expander 223 may 
receive and pre-process entries in the "Address" field of the listings database 
110 based on corresponding rules in the rules database 21 1 . The street name 
expander 223 may identify entries in the database 110 that do not match or 
correlate with the corresponding entries in the reference database 270. For 
example, the entries located in the address field may include street names that 
may include abbreviations that may need to be expanded, and/or typographical 
errors and/or misspellings that need to be corrected. The street name 
expander 223 may receive all of the entries in the address field from database 
110 and correlates the street name in each entry of database 1 10 to street 
name entries located in the reference database 270 to correct any deviations in 
the database 110. 
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[0045] According to the rule set in the rules database 21 1 , the street name 
expander 223 may correlate only entries with respect to a township, city, etc. in 
which the street address in located. In alternative embodiments of the present 
invention, the street name expander 223 may correlate all of the entries in the 
database 1 10 with corresponding entries in reference database 270. The street 
name expander 223 may compare street address entries in the listings 
database 1 10 with corresponding field entries in the reference database 270. 

[0046] If the expander 223 identifies entries in database 110 that do not 
correlate with corresponding entries in the reference database 270, the 
expander 223 may, based on the corresponding rules 21 1 , modify such entries 
as needed. If a close match between a corresponding entry of the database 
110 and reference database 270 is found, the street name in the database 110 
may be modified. For example, the entry "Yale Dr." may be modified to "Yale 
Drive" based on a match found in the reference database 270. Additionally, 
street name expander 223 may modify the entry to correct other errors that may 
be included in the entry. 

[0047] If the modification is performed with a high level of confidence, the 
modified entry may be sent to the confirmed pre-processed listings database 
250 for storage and/or sent to the automated attendant 190. Alternatively, if the 
modification is performed with a low level of confidence, the modified entry may 
be forwarded to the non-confirmed listings database 240 for operator 
confirmation and/or modification as described herein. 

[0048] In embodiments of the present invention, township corrector 223 may 
receive and pre-process entries in the "Township" field of the listings database 
110 based on corresponding rules in the rules database 211. As used herein, 
the term, township may refer to the community, town, the city, state, etc. of 
interest. In embodiments of the present invention, township corrector 225 may 
correlate entries in the township field of white pages listings database 110 with 
corresponding entries in the reference database 270. 
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[0049] In embodiments of the present invention, the township corrector 225 may 
employ corresponding rules from rules database 21 1 to pre-process the 
township entries. The township corrector 225 may identify entries in the 
database 110 and that do not match or correlate with the corresponding entries 
in the reference database 270. For example, based on the rules, the township 
corrector 225 may correlate the township entries in database 1 10 with 
corresponding entries in the reference database 270 to expand abbreviations, 
and/or to correct typographical errors and/or misspellings, or to remove 
extraneous information included in the township entry. For example, the 
township corrector 225 may remove extraneous information, for example, words 
such as township, city, etc. after a valid name, and/or hyphens or other 
punctuation that does not appear in the corresponding township entries in the 
reference database 270. 

[0050] In embodiments of the present invention, the township corrector 225 may 
use, for example, a zip code entry to correlate township name in the database 
110 with corresponding entries in the reference database 270. 

[0051] If the township corrector 225 identifies entries in database 1 10 that do 
not correlate with corresponding entries in the reference database 270, the 
township corrector 225 may, based on the corresponding rules 21 1 , modify 
such entries as needed. If the modification is performed with a high level of 
confidence, the modified entry may be sent to the confirmed pre-processed 
listings database 250 for storage and/or sent to the automated attendant 190. 
Alternatively, if the modification is performed with a low level of confidence, the 
modified entry may be forwarded to the non-confirmed listings database 240 for 
operator confirmation and/or modification as described herein. 

[0052] It is recognized that spelling and/or punctuation/grammar errors may be 
corrected as the components of the pre-processor 220 process the entries of 
database 1 10 as described above. Alternatively, the preprocessor 220 may 
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also include a separate spelling checker and/or grammar checker (not shown) 
to correct spelling and/or grammar errors in the entries. 

[0053] FIG. 3 is a block diagram illustrating the use of an operator interface 180 
in accordance with an embodiment of the present invention. The operator 
interface 180 may be a GUI used by an operator to confirm and/or modify 
entries pre-processed by pre-processor 220 with a low confidence level. 
Additionally, the operator interface 180 may be used to edit and/or add rules to 
the rules database 211. 

[0054] In embodiments of the present invention, if the pre-processor 220 
determines, based on the rules in database 21 1, that an entry in database 110 
was modified or pre-processed with a low confidence level, the entry is 
forwarded to the non-confirmed listings database 240, as shown in FIG. 3. In 
embodiments of the present invention, using interface 180 an operator may 
access the non-confirmed entries residing in database 240 and determine 
whether the modifications are correct. If the low confidence modifications are 
determined to be correct by the operator, the modified entries may be sent to 
the confirmed pre-processing listings database 250 for storage and/or to the 
automated attendant 190. 

[0055] Alternatively, in embodiments of the present invention, if the operator 
determines that one or more entries in the non-confirmed listings database 240 
are not correct, the operator using operator interface 180 may be presented 
with a plurality of suggested corrections that had been generated by the system 
using the rules in rules database 21 1 , that may be used to modify the entry. 
Using the input interface 300, the operator may select one of the choices 
presented by the GUI 180. The operator's choice may be captured by the GUI 
180 and the pre-processor may pre-process the entry in accordance with the 
selected correction. Alternatively, the operator may modify the entry at the 
operator's discretion. The modified entry may be sent to the confirmed pre- 
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processing listings database 250 for storage and/or to the automated attendant 
190. 



[0056] In alternative embodiments of the present invention, the operator may 
use the GUI 180 to compile a new rule set and/or modify an existing rule set. 
The newly compiled rule set may be captured by the GUI 180 and the pre- 
processor may pre-process the entry in accordance with newly compiled rule 
set. If a new rule is compiled, the operator may also choose the scope of 
application for the new rule. In other words, the GUI 180 may present the 
operator with selections relating to the scope of the new or modified rules. In 
other words, the operator may select how the newly compiled rules should be 
applied. The operator may select that the newly compiled rule should be 
applied globally, for the current case only, for future cases, for previous cases, 
for all names, for all states, for all townships and/or any other case desirable. 
Using the input interface 300, the operator may select one of the choices 
presented by the GUI 180. The operator's choice may be captured by the GUI 
180 and the pre-processor may apply the rule in accordance with the operator's 
selection. 

[0057] FIG. 4 is a flowchart illustrating a listings pre-processing method in 
accordance with an exemplary embodiment of the present invention. As shown 
in step 4010, a pre-processor 220 of listings pre-processing device 120 
receives a first directory listings that includes one or more fields. For example, 
the first directory listing may be a white pages listings from database 110. The 
one or more fields included in the first directory listings may contain one or 
more entries and the entries may contain one or more symbol strings. The pre- 
processor receives a second directory listing that also includes one or more 
fields, as shown in step 4020. The second directory listing may be, for 
example, a reference database 270. The one or fields included in the second 
directory listings may contain one or more entries and the entries may contain 
one or more symbol strings 
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[0058] After the pre-processor 220 receives the first and second directory 
listings, the pre-processor 220 correlates entries in the one or more fields of the 
first directory listings with entries in the corresponding one or more fields of the 
second directory listings, as shown in step 4030. As shown in step 4040, the 
pre-processor 220 identifies entries, in the one or more fields of the first 
directory listings, which do not correlate with entries in the corresponding one or 
more fields of the second directory listings. The identified entries are processed 
using a rule set corresponding to the field in which the entry is located, as 
shown in step 4050. The pre-processor 220, based on the corresponding rule 
set, determines a corresponding confidence level for the processed entries, as 
shown in step 4055. 

[0059] In embodiments of the present invention, if the identified entries have a 
corresponding confidence level exceeding or meeting a threshold, then the 
processed entries are automatically modified, as shown in steps 4060-4070. In 
that case, the modified entries are output for processing, as shown in step 
4080. For example, the modified entries may be output to a confirmed pre- 
processed listings database 250 and/or to an automated attendant 190. 

[0060] If in step 4060 the identified entries have a corresponding confidence 
level below threshold, the processed entries are marked for operator 
confirmation, as shown in step 4090. The marked entries are presented to the 
operator for confirmation and/or further modification, as shown in step 4100. 

[0061] In embodiments of the present invention, the operator may use a GUI 
interface to check the entries. The operator may modify the entries using 
existing rules or the operator may modify the entry using new rules. In 
embodiments of the present invention, the operator may edit or update a rule 
and/or may add a new rule to the rules database 211. If the operator edits an 
existing rule and/or adds a new rule, previously modified entries may the 
processed using the updated rule and/or the new rule. Once the entries are 
modified by operator intervention, and/or a modified or new rule set, the 
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modified entries are output for processing, as shown in step 4080. As indicated 
above, the modified entries may be output to a confirmed pre-processed listings 
database 250 and/or to an automated attendant 190. 

[0062] Several embodiments of the present invention are specifically illustrated 
and/or described herein. However, it will be appreciated that modifications and 
variations of the present invention are covered by the above teachings and 
within the purview of the appended claims without departing from the spirit and 
intended scope of the invention. 
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